OrpheusDB: Bolt-on Versioning for Relational Databases

نویسندگان

  • Silu Huang
  • Liqi Xu
  • Jialin Liu
  • Aaron J. Elmore
  • Aditya G. Parameswaran
چکیده

Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. While git and svn are highly effective at managing code, they are not capable of managing large unordered structured datasets efficiently, nor do they support analytic (SQL) queries on such datasets. We introduce ORPHEUSDB, a dataset version control system that “bolts on” versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database “for free”, while the database itself is unaware of the presence of dataset versions. We develop and evaluate multiple data models for representing versioned data, as well as a light-weight partitioning scheme, LYRESPLIT, to further optimize the models for reduced query latencies. With LYRESPLIT, ORPHEUSDB is on average 10× faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to 20× relative to schemes without partitioning. LYRESPLIT can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by 10× on average.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Model forSpatio - Temporal Schema

Schema versioning provides a mechanism for handling change in the structure of database systems and has been investigated widely, both in the context of static and temporal databases. With the growing interest in spatial and spatio-temporal data as well as the mechanisms for holding such data, the spatial context within which data is formatted also becomes an issue. This paper presents a genera...

متن کامل

On Schema Versioning in Temporal Databases

The support of schema versioning has been considered in the literature on temporal databases only at a limited extent. In particular, solutions for managing schema versions along transaction-time as different interfaces on the same temporal data were proposed so far. In this paper we investigate the distinct functionalities of new solutions for schema versioning along validand transaction-time ...

متن کامل

A formal model for temporal schema versioning in object-oriented databases

The problem of supporting temporal schema versioning has been extensively studied in the context of the relational model. In the object-oriented environment, previous works were devoted to the study of the different aspects of schema evolution or (non-temporal) versioning in branching models, due to the traditional origination of the object-oriented model from CAD/CAM and CIM. Nowadays, the com...

متن کامل

Towards a Model for Spatio-Temporal Schema Selection

Schema versioning provides a mechanism for handling change in the structure of database systems and has been investigated widely, both in the context of static and temporal databases. With the growing interest in spatial and spatio-temporal data as well as the mechanisms for holding such data, the spatial context within which data is formatted also becomes an issue. This paper presents a genera...

متن کامل

A Taxonomy for Schema Versioning Based on the Relational and Entity Relationship Models

Recently there has been increasing interest in both the problems and the potential of accommodating evolving schema in databases, especially in systems which necessitate a high volume of structural changes or where structural change is difficult. This paper presents a taxonomy of changes applicable to the Entity-Relationship Model together with their effects on the underlying relational model e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2017